Athletes have embraced new technologies to improve and enjoy their sporting activities. For example, Strava (https://www.strava.com/) is a mobile app and website designed for runners and cyclers to track their routes. Surfers are no different. While there are many apps geared towards understanding surf conditions (e.g., http://www.surfline.com/, http://magicseaweed.com/), and more recently, sensors to track individual rides (http://www.traceup.com/), there is a clear void in the middle ground.
I view this middle ground as a simple, personalized application that learns from each individuals lifetime surfing experience, and predicts the quality of a surf session based on simple inputs. In other words - I seek to help answer age-old questions for surfers, like - should I go surfing today? Where should I go surfing today? But the answers to these questions are not generic (e.g., from surfing websites), but rather, tailored to the individual.
After each session, the surfer inputs a few key metrics about their surf. These would include, for example:
The user inputs are used to determine the conditions for each session, including:
using the following data sources:
The user inputs and ocean conditions are used for two purposes:
I used a dataset of surf sessions over three years in Monterey, California for demonstration. This surf log was then integrated with hourly buoy data from Monterey Bay from 2013-present, as well as hourly tidal heights for Monterey bay over the same period.
I used the following packages in R:
Below, I plot the monthly number of surf sessions:
Figure 1. Number of surf session per month over a three year period in Central California
And here I plot the spatial distribution of the surf sessions, summarized in the following ways:
Figure 2. Visualizing a surf log from Monterey Bay, California. The colored sites are ones that will be used in model predictions. Note that some sites are infrequently visited but are worth the trip, like Capitola.
The real benefit of the app will be to inform the user whether a particular site, given the current conditions, will be worth surfing. The statistical model will take the current NOAA, tide, and weather conditions, and integrate them with the historical user data to predict a metric of quality. As a simple starting point, I fit the following generalized linear model, using a poisson distribution because I was modeling a count outcome (the number of rides per session i):
\[Rides_{i} = Poisson(\mu_{i})\] \[E(Rides_{i}) = var(Rides_{i}) = \mu_{i}\]
\[log(\mu_{i}) = \alpha + \beta_{h}Height_{i} + \beta_{p}Period_{i} + \beta_{d}Direction_{i} + \beta_{t}Tide_{i}\]
where Height is swell height (WVHT; m), Period is the dominant swell period (DPD; s), Direction is the mean wave direction (MWD; degrees), and Tide is the tidal height (TideHeight; m).
I fit the model to the central California surf log for each site separately. I can use the model to predict the number of rides as a function of each variable (while holding all others at their median value), and thus visualize the partial effects for one site below:
Figure 3. Partial effects of each predictor from the generalized linear model fit to the surf log data from one site, Asilomar.
These partial effects plots suggest the central California Surfer has the best session (in terms of ride count) when the swell height is small (< 4 ft) and the swell direction is from the northwest (> 300). To a lesser extent, central California surfer surfs more waves at lower tide and when the swell is larger, but these effects are marginal.
I used this modeling approach to predict the number of rides at hourly intervals for each of five sites over a three year period. Here I focus on the ability of the model to hindcast surf quality in order to validate and fine-tune the modeling approach. Ultimately, however, the goal will be to forecast future surf quality given the current and predicted ocean conditions.
For simplicity, I will focus on five randomly selected days for which there was a surf session, and compare the predicted number of rides (at all of the sites) with the observed number of rides at a single site.
Figure 5. Predicted number of rides on five randomly selected surf days. The lines represent the predictions, and the single large point on each panel represents the observed number of rides. Predictions of zero rides have been omitted for clarity.
On some dates, the model fit the data reasonably well- for example, the predictions for 2014, 2016, and 2017 were very similar to the actual number of rides. The best site, for a given time, was chosen on several occasions, but the timing and location could have been better to catch more waves (e.g., 2014, 2016).
In summary, I plot the observations against predictions below:
Figure 6. Observed number of rides plotted against the predicted number of rides.
The model appears to fit the extremes better than intermediate values, but a few more sessions are needed (especially at Capitola!) to improve confidence in these estimates.
This analysis is based on an example surf log, integrated with hourly swell and tide conditions over a three year period - and demonstrates reasonable proof of concept. However, there is room for improvement in many respects. For example: